Goto

Collaborating Authors

 different attention module




Auto Learning Attention: Supplementary Material

Neural Information Processing Systems

The initial learning rate is 0.1, and The weight decay is set as 0.0005. The batch size is 256. The results are summarised in Table 3 of the paper. The learning rate starts from 0.1 We replace it with ResNet50 to evaluate the performance of different attention modules. The conv5_x, average pooling, fc, and the softmax layers are removed from the original classification model.


Auto Learning Attention Benteng Ma

Neural Information Processing Systems

Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially. However, designing the structures of different attention operations requires a bulk of computation and extensive expertise.